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In the Specification: 

Change the paragraph that begins on page 1, line 9 to: 

A speech recognizer trained with relatively a quiet office environment speech data 
and then operating in a mobile environment may fail due to at least to the tew two 
distortion sources of back ground noise and microphone changes. The background noise 
may , for example, be from a computer fan, car engine, and/or road noise. The 
microphone changer changes may be due to the quality of the microphone, whether the 
microphone is hand-held or hands-free and, a the_ position of the microphone to the 
mouth. In mobile applications of speech recognition, both the microphone conditionser 
and background noise are subject to change. 

Change the paragraph that begins on page 1, line 17 and continues to the end of the page 
to: 

Cepstral Mean Normalization (CMN) removes utterance mean and is a simple and 
effective way of dealing with convolutive distortion such as telephone channel distortion. 
See "Effectiveness of Linear Prediction Characteristics of the Speech Wave for 
Automatic Speaker Identification and Verification" of B. Atal in Journal of Acoustics 
Society of America, Vol. 55: 1304-1312, 1974. Spectral Subtraction (SS) reduces 
background noise in the feature space. See article "Suppression of Acoustic Noise in 
Speech Using Spectral Subtraction" of S.F. Boll in IEEE Transactions on Acoustics, 
Speech and Signal Processing, ASSP-27(2): 1 13-129, April 1979. Parallel Model 
Combination (PMC) gives an approximation of speech models in noisy conditions from 
noise-free speech models and noise estimates. See "An Improved Approach to the 
Hidden Markov Model Decomposition of Speech and Noise" of M.J. F. Glaes Gales and 
S. Young in Proceedings of IEEE International Conference on Acoustics, Speech and 
Signal Processing, Volume 1, pages 233-236, U.S.A., April 1992. The techniques do not 
require any training data. 

Change the paragraph that begins at the top of page 2, line 2 to line 21 to: 
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Joint compensation of additional additive noise and convolutive noise can be 
achieved by the introduction of a channel model and a noise model. A spectral bias for 
additive noise and a cepstral bias for convolutive noise are introduced in an article by M. 
Afify, Y. Gong, and J. P. Haton. This article is entitled "A General Joint Additive and 
Convolutive Bias Compensation Approach Applied to Noisy Lombard Speech 
Recognition" in IEEE Trans, on Speech and Audio Processing, 6(6): 524-538, November 
1998. The five two biases can be calculated by application of Expectation Maximization 
(EM) in both spectral and convolutive domains. A procedure by J.L. Gauvain, et al, is 
presented to calculate the convolutive component, which requires rescanning of training 
data. See J.L. Gauvain, L. Lamel, M. Adda-Decker, and D. Matrouf entitled 
"Developments in Continuous Speech Dictation using the ARPA NAB News Task." In 
Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing, 
pages 73-76, Detroit, 1996. Solution of the convolutive component by a_steepest descent 
method has also been reported. See Y. Minami and S. Furui entitled "A Maximum 
Likelihood Procedure for a Universal Adaptation Method Based on HMM Composition." 
See Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing, 
pages 129-132, Detroit, 1995. A method by Y. Minami and S. Furui needs additional 
universal speech models, and red e stination re-estimation of channel distortion with the 
universal models when channel changes. See Y. Minami and S. Furui entitled 
"Adaptation Method Based on HMM Composition and EM Algorithm" in Proceedings of 
IEEE International Conference on Acoustics, Speech and Signal Processing, pages 327- 
330, Atlanta 1996. . 

Change the paragraph that begins on page 2, line 28 and continues through page 3, line 5 
to: 

Alternatively, the nonlinear changes of both type of distortions can be 
approximated by linear equations, assuming that the changes are small. AJacobian 
approach, which models speech model parameter changes as the product of a jacobian 
matrix and the difference in noisy conditions, and statistical linear approximation are 
along this direction. See S. Sagayama, Y. Yamaguchi, and S. Takahashi entitled 
"Jacobian Adaptation of Noisy Speech Models," in Proceedings of IEEE Automatic 



3 



TI-32328 

Speech Recognition Workshop, pages 396-403, Santa Barbara, CA, USA, December 
1997. IEEE Signal Processing Society. Also see "Statistical Linear Approximation for 
Environment Compensation" of N.S. Kim, IEEE Signal Processing Letters, 5(1): 8-10, 
January 1998. 



Change the paragraph that begins on page 3, line 15 as follows: 

In accordance with one establishment of the present inv e ntor invention a new 
method is disclosed that Handl e s simultaneously handles noise and channel distortions to 
make a speaker independent system robust to a wide variety of noises and channel 
distortions. 

Change the paragraph on page 3, line 25 to: 

Figure 2 illustrates the method of the present invention. G e n e rating 

Change the paragraph that begins on page 4, line 3 to: 

Referring to Fig. 1 there is illustrated a speech recognizer according to the present 
invention. The speech is applied to recognizer 1 1 . The speech is compared to Hidden 
Markov Models (HMM) 13 to recognize the text. The models initially provided en are 
those wkh based on speech recorded in a quiet environment and the with a microphone of 
good quality. We want to develop a speech model set suitable for operating in the 
simultaneous presence of channel/microphone distortion and background noise. In 
accordance with the present invention, a speech model set is provided using statistics 
about the noise and speech . A low computation cost method int e grates both PMC and 

Change the paragraph that begins on page 4, line 12 as follows: 

Referring to Figure 2, the first Step 1 is to start with HMM models trained on 
clean speech, with cepstral mean normalization. We modify these models to get models 
to compensate for channel/microphone distortion (convolutive distortion) and 
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simultaneous background noise (additive distortion). The HMM modeling method of this 
invention represents the acoustic probability density function (PDF) corresponding to 
each HMM state as a mixture of Gaussian components, as is well known in the art, fe? 
an Such HMM models^we have a-4e^ef many paramete rs, such as Gaussian component 
mean vectors, covariances, and mixture component weights for each state, as well as 
HMM state transition probabilities. The method of this invention teaches modifying but 
only chang e one subs e t of th e param e ters and that is the mean vectors m PJ t k- of the 
original model space, Th e m e an v e ctors m^ of th e original mod e l spac e is modifi e d 
where p is the index of the Probability D e nsity Function (PDF) HMM , j is the state and k 
is the mixing component. 

Change the paragraph that begins on page 4, line 20 as follows: 

The second Step 2 is to calculate which is the mean mel-scaled cesptrum 
coefficients (MFCC) vector over the trained database. Scan all data and calculate the 

mean to get ~b~b. 

Change the paragraph that begins on page 4, line 23 as follows: 

The third Step 3 is to add mean ir bto each of this mean vector pool represented 
by m PJ> k equation (1) to get: 

m^ = m p ^+b. (1) 

Change the paragraph that begins on page 4, line 28 to: 

For example, there could be 100 PDF HMMs , 3 states per PDF HMM and 2 vectors per 
state, or a total of 600 vectors. 

Change the paragraph that begins on page 5, line 15 to: 

In Step 5, we calculate the mean vectors adapted to the noise X using equation 4. 

m PJfk = IDFT(DFT(m PJtk ) 0 DFT(X)). (4) 
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where DFT and ZDiTare, respectively, the DFT and inverse DFT operation, «bt- 
is the noise compensated mean vector. 

Change the paragraph that begins on page 5, line 21 and ends on page 6, line 2 to: 

Equation 4 involves several operators. DFT is the Discrete Fourier Transform and IDFT 
is the Inverse Discrete Fourier Transform , which are respectively used to convert from 
the cepstrum domain to the log spectrum domain, and vice versa . The © is an operation 
applied to two log spectral vectors to produce a log spectral vector representing the linear 
sum of spectra, with two v e ctors. A ® B ™ C. Hew The operation 0 is defined , w e look 
at by equations 2 and 3. Equation 2 says defines the operation [[+]]© which operates on 
two D dimensional vectors u and v and the result is a vector of D dimensions, er 

[ w\ > W2 »— wd ] r w ^ere T is the transposition. We tak e th e two v e ctors and produce 
another v e ctor. W e n ee d to sp e cify e ach el e m e nt in th e r e sultant v e ctor. Equation 3 
defines says that the jth element in that vector ( J. ) is d e fin e d by th e e xpon e ntial of th e 

e l e m e nt of u added to th e e xponential if the jth e l e m e nt of v.and tak e th e log of th e 
combination of th e e xpon e ntial of u add e d to th e e xpon e ntial of the j th e e lem e nt of v . 
This completes the definition of Equation 4. 

Change the paragraph that begins on page 6, line 4 to: 

In the following steps, we need to remove the mean vector b of the noisy data y 
over the noisy speech space W_(from the resultant model). One may be able to 
synthesize enough noisy data from compensated models but this requires a lot of 
calculation. In accordance with the present invention the vector is calculated using 
statistics of the noisy models. The whole recognizer will operate with CMN (cepstral 
mean normalization mode), but the models in Equation 4 are no longer mean normalized. 
We have dealt with additive noise. The second half of the processing is removing the 
cepstral mean of our models defined in Equation 4. This is not difficult because we have 
the models in Equation 4. In Step 6, we need to integrate all the samples generated by 

Equation 4 to get the mean . M e an is b . Equation 5 is this integration. 
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Change the paragraph that begins on page 6, line 14 to: 

Let J-( be the variable denoting P©F HMM index, J be the variable for state index, and % 
be the variable for mixing component index. 



Change the paragraph that begins on page 7, line 1 to: 

Equation 7 shows that b can be worked out analytically, and it is not necessary to do the 
physically generation and integration. The final result is represented by Equation 7 
which reduces i s the integration into sev e ral sums . Sums over HMMs, probability 
d e nsity functions and th e sum 2 over states and sum-over mixing components. Th e n you 
hav e sev e ral quantities. Th e P H is th e probability of having th e PDF ind e x. The giv e n 
#is th e probability of b e ing in th e stat e if giv e n th e PDFp. Th e n e xt is th e probability 
the mixing compon e nt p j giv e n w e hav e th e PDF ind e x. The m e an v e ctor of th e 
comp e nsat e d mod e . To mak e this compl e t e Finally the estimated noise-compensated 

A 

channel bias, w e remov e this b a is removed from the compensated model means to get 
the target model means . This is Step 7. The target model is: 



b=E{y} 



(5) 




P 



J 



k 



k)dy 



(8) 



Change the paragraph that begins on page 7, line 14 to: 

This resulting target model means are the desired modified parameters of the HMM 
models used in the is what w e want to load into our recognizer. This operation is done 
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for each utterance. Figure 2 illustrates that for a next utterance (Step 8) the process starts 
with step 4. 

Change the paragraph that begins on page 7, line 16 to: 

Calculation of b thus requires the knowledge of the probabilityies of each PDF. 
There are two issues with P {J-C~ p)\ the probabilities: 

• U They needs additional storage space. 

• They are dependent of the recognition task e.g. vocabulary, grammar. 
Change the paragraph beginning on page 7, line 22 to: 

Although it is possible to obtain that the probabilityies, we want to can also consider the 
following simplified cases. 

Change the paragraph beginning on page 7, line 24 and continuing to page 8, line 15: 
T-his The operations to calculate the b can be simplified by assuming with thr e e 
approximations. Th e first on e us e s e qual probabiliti e s for P&(p) or constraint C. 



1 . Us e e qual probabiliti e s for Px &fi 



2. Use equal probabilities for P^ip), Pj \ xP(f\p) and P K \ Kj(k\p,j). 



P*<P) = C 

Pj\mV\P) = D (10) 
PK\Xj(k\pJ) = E 
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C, D and E are selected such that they represent equal probabilities. Therefore we 
have the following: C is chosen such that it provides a probability such that each HMM is 
likely, so C=l/(number of HMM models); D is chosen such that each state of a given 
HMM is equally likely, where the HMM is indexed by p, so D=l /(number of states in 
HMM(p)); and E is chosen such that each mixing component of a state of an HMM is . 
equally likely, where the state of an HMM is indexed by i, so E=l /(number of mixing 
components in HMM(p) stated). 

^7 In fact, the case described in Eq-10 consists in averaging the 
compensated mean vectors -m- m^. Referring to Eq-4 and Eq-1, it can be expected that 
the averaging reduces the speech part m^^ just as CMN does. Therefore, Eq-7 could be 
further simplified into: 

b = IDFT (DFT(b) © DFT ( X )). (11) 

The model m p J k of Eq-8 is then used with CMN on noisy speech. Unfortunat e ly, ir4s-a 

function of both chann e l and background nois e in all above cas e s. In oth e r words, in 
pr e s e nc e of nois e , th e re is no guarant ee that th e chann e l will b e r e mov e d by such a 
vector, as is for CMN. 

Change the paragraph that begins on page 8, line 17 to: 

A subset of WAVES database containing hands fr ee recordings in a car was used ^, which 
consists of thr ee r e cording s e ssions: park e d trn (car park e d, e ngin e off), park e d (car 
parked, e ngin e off), and city driving (car driv e n on a stop and go basis). 

Remove the paragraph beginning on page 8, line 21 starting with "In each session" and 
ending with "10 dynamic coefficients". 

Change the paragraph beginning on page 8, line 26 as follows: 

HMMs used in all experiments a*e were trained using in TIDIGITS clean speech data. 

Utterance-based cepstral mean normalization is was u sed. Th e HMMs contain 1957 
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m e an vectors, and 270 diagonal variances. Evaluated on TIGIDIT test s e t, the r e cogniz e r 
giv e s 0,36% word e rror rate. 

Remove the paragraph beginning on page 9, line 1 starting with " To improve" and 
ending with "J AC (joint compensation of additive noise and convolutive distortion)". 

On page 9, remove the Table 1 beginning on line 6 and description on lines 9 and 10. 

Change the paragraph that begins on page 9, line 12 to the end of the specification on 
page 10 at line 8 as follows: 
Table 1 shows that: 

♦ Compar e d to noise fr e e r e cognition (WER) (0.36%), without any comp e nsation 

(BASELENE) th e r e cognition p e rformanc e d e grad e s sev e r e ly. 

• CMN e ff e ctiv e ly r e duc e s th e WER for parked data, but is not eff e ctiv e for 

driving conditions whore additiv e nois e b e comes dominant. 

« PMC substantially r e duc e s th e WER for driving conditions, but giv e s poor 

results for park e d data wh e r e microphon e mismatch is dominant. 

• All JAC cas e s give lower WER than non JAC m e thods, 

# Simplifying Eq 7 to Eq 9 then to Eq 10 r e sults in progr e ssiv e increas e in WER, 

although th e d e gradation is not sev e re. Especially, information in PDF 
probability is not critical to the performanc e . 

• Simplifi e d JAC giv e s low e st WER in all t e sts. Experimental results show that 
the new invented method For this hands fr ee speech r e cognition, th e n e w m e t 
reduces word error rate by 61% for park e d condition and to 94% relative to 
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baseline performancefe r depending on e ky driving condition , and the 
method is superior to other reported methods . 
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